Search CORE

65 research outputs found

Character-Aware Neural Language Models

Author: Jernite Yacine
Kim Yoon
Rush Alexander M.
Sontag David
Publication venue
Publication date: 01/12/2015
Field of study

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Tree block coordinate descent for map in graphical models

Author: Jaakkola Tommi S.
Sontag David Alexander
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2009
Field of study

abstract URL: http://jmlr.csail.mit.edu/proceedings/papers/v5/sontag09a.htmlA number of linear programming relaxations have been proposed for finding most likely settings of the variables (MAP) in large probabilistic models. The relaxations are often succinctly expressed in the dual and reduce to different types of reparameterizations of the original model. The dual objectives are typically solved by performing local block coordinate descent steps. In this work, we show how to perform block coordinate descent on spanning trees of the graphical model. We also show how all of the earlier dual algorithms are related to each other, giving transformations from one type of reparameterization to another while maintaining monotonicity relative to a common objective function. Finally, we quantify when the MAP solution can and cannot be decoded directly from the dual LP relaxation

CiteSeerX

DSpace@MIT

Cutting plane algorithms for variational inference in graphical models

Author: Sontag David Alexander
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 65-66).In this thesis, we give a new class of outer bounds on the marginal polytope, and propose a cutting-plane algorithm for efficiently optimizing over these constraints. When combined with a concave upper bound on the entropy, this gives a new variational inference algorithm for probabilistic inference in discrete Markov Random Fields (MRFs). Valid constraints are derived for the marginal polytope through a series of projections onto the cut polytope. Projecting onto a larger model gives an efficient separation algorithm for a large class of valid inequalities arising from each of the original projections. As a result, we obtain tighter upper bounds on the logpartition function than possible with previous variational inference algorithms. We also show empirically that our approximations of the marginals are significantly more accurate. This algorithm can also be applied to the problem of finding the Maximum a Posteriori assignment in a MRF, which corresponds to a linear program over the marginal polytope. One of the main contributions of the thesis is to bring together two seemingly different fields, polyhedral combinatorics and probabilistic inference, showing how certain results in either field can carry over to the other.by David Alexander Sontag.S.M

CiteSeerX

DSpace@MIT

Scaling all-pairs overlay routing

Author: Andersen David G.
Karger David R.
Phanishayee Amar
Sontag David Alexander
Zhang Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

This paper presents and experimentally evaluates a new algorithm for efficient one-hop link-state routing in full-mesh networks. Prior techniques for this setting scale poorly, as each node incurs quadratic (n[superscript 2]) communication overhead to broadcast its link state to all other nodes. In contrast, in our algorithm each node exchanges routing state with only a small subset of overlay nodes determined by using a quorum system. Using a two round protocol, each node can find an optimal one-hop path to any other node using only n[superscript 1.5] per-node communication. Our algorithm can also be used to find the optimal shortest path of arbitrary length using only n[superscript 1.5] logn per-node communication. The algorithm is designed to be resilient to both node and link failures. We apply this algorithm to a Resilient Overlay Network (RON) system, and evaluate the results using a large-scale, globally distributed set of Internet hosts. The reduced communication overhead from using our improved full-mesh algorithm allows the creation of all-pairs routing overlays that scale to hundreds of nodes, without reducing the system's ability to rapidly find optimal routes.National Science Foundation (U.S.).National Science Foundation (U.S.). Graduate Research Fellowship Progra

CiteSeerX

DSpace@MIT

Learning efficiently with approximate inference via dual losses

Author: Globerson Amir
Jaakkola Tommi S.
Meshi Ofer
Sontag David Alexander
Publication venue: International Machine Learning Society
Publication date: 01/01/2010
Field of study

Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting- plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane

CiteSeerX

DSpace@MIT

Learning bayesian network structure using lp relaxations

Author: Globerson Amir
Jaakkola Tommi S.
Meila Marina
Sontag David Alexander
Publication venue: Society for Artificial Intelligence and Statistics
Publication date: 01/05/2010
Field of study

We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming

CiteSeerX

DSpace@MIT

PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming

Author: Agrawal Monica
Lew Alexander K.
Mansinghka Vikash K.
Sontag David
Publication venue
Publication date: 07/08/2020
Field of study

Data cleaning is naturally framed as probabilistic inference in a generative model, combining a prior distribution over ground-truth databases with a likelihood that models the noisy channel by which the data are filtered, corrupted, and joined to yield incomplete, dirty, and denormalized datasets. Based on this view, we present PClean, a unified generative modeling architecture for cleaning and normalizing dirty data in diverse domains. Given an unclean dataset and a probabilistic program encoding relevant domain knowledge, PClean learns a structured representation of the data as a relational database of interrelated objects, and uses this latent structure to impute missing values, identify duplicates, detect errors, and propose corrections in the original data table. PClean makes three modeling and inference contributions: (i) a domain-general non-parametric generative model of relational data, for inferring latent objects and their network of latent connections; (ii) a domain-specific probabilistic programming language, for encoding domain knowledge specific to each dataset being cleaned; and (iii) a domain-general inference engine that adapts to each PClean program by constructing data-driven proposals used in sequential Monte Carlo and particle Gibbs. We show empirically that short (< 50-line) PClean programs deliver higher accuracy than state-of-the-art data cleaning systems based on machine learning and weighted logic; that PClean's inference algorithm is faster than generic particle Gibbs inference for probabilistic programs; and that PClean scales to large real-world datasets with millions of rows.Comment: Added references; revised abstrac

arXiv.org e-Print Archive

Overcomplete Independent Component Analysis via SDP

Author: Bach Francis
d'Aspremont Alexandre
Perry Amelia
Podosinnikova Anastasia
Sontag David
Wein Alexander
Publication venue
Publication date: 24/01/2019
Field of study

We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.Comment: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). 21 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Exploring the sacrality of reading as a social practice

Author: Adam Reed
Alberto Manguel
Alberto Manguel
Alfred Schutz
Amaranth Borsuk
Andrew Abbott
Angela Xiao Wu
Antoine Hennion
Antoine Hennion
Antoine Hennion
Ashleigh Watson
Bernard Williams
Beth Driscoll
C. Clayton Childress
Charles Altieri
Charles Taylor
Charles Turner
Clayton Childress
Cristina Simko
David Barton
David Beer
David Frisby
Elizabeth Long
Emile Gomart
George Steiner
Hans Robert Jauss
Helen Taylor
Howard S Becker
Ineke Nagel
Isaac Reed
J Alexander
James Baldwin
James Phelan
Jay David Bolter
Jeanette Winterson
Jeffrey Alexander
Jeffrey Alexander
Jeffrey Alexander
Jeffrey Alexander
Jeffrey Alexander
Jenny Hartley
Jerome Seymour Bruner
Jonathan Rose
Josie Billington
José Ossandón
Jurgen Habermas
JW Meyer
Leah Price
Luc Boltanski
Lynette Spillman
Marco Solaroli
Mary Douglas
Maryanne Wolf
María Angélica Thumala Olave
María Angélica Thumala Olave
Megan Sweeny
Megan Sweeny
Monica Lee
Natalie Heinich
Paul Ricoeur
R Dunch
Richard Biernacki
Richard Biernacki
Richard H Brown
Richard Sennett
Richard Sennett
Richard Sennett
Richard Swedberg
Rita Felski
Rita Felski
Roger Chartier
Susan Sontag
TG Kirsch
Theodore G Striphas
Timothy Aubry
Timothy Aubry
Tony Bennett
Wendy Griswold
Wendy Griswold
Wendy Griswold
Wendy Griswold
Wolf Lepenies
Wolfgang Iser
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2021
Field of study

Crossref

Edinburgh Research Explorer